This goes into some of the material in Lab 2 in more depth.

load("landings.RData")
landings$log.metric.tons = log(landings$metric.tons)
landings = subset(landings, Year <= 1989)
anchovy = subset(landings, Species=="Anchovy")$log.metric.tons
require(urca)
require(ggplot2)
require(tseries)
require(forecast)
require(reshape2)
require(gridExtra)

A random walk

A random walk is non-stationary. The variance increases with time. We can see this is we plot many random walks on top of each other.

A single random walk can be made with a simple for loop or with a cumsum on white noise.

# this is a random walk
TT <- 100
sd <- 0.2
rw <- rep(0,TT)
for(i in 2:TT) rw[i] <- rw[i-1]+rnorm(1, sd=sd)
# and this is a random walk
rw <- cumsum(rnorm(TT, sd=sd))

Let’s plot 100 random walks on top of each other. Notice how the variance (range of values on the y-axis grows) the farther we get from the start.

nsim <- 100
plot(rw, type="l", ylim=c(-4,4))
for(i in 1:nsim){
  rw <- cumsum(rnorm(TT, sd=sd))
  lines(rw)
}

Optional We can make a similar plot with ggplot.

rw <- rep(0,TT)
err <- rnorm(TT, sd)
for(i in 2:TT) rw[i]=rw[i-1]+err[i]
dat <- data.frame(t=1:TT, rw=rw)
p1 <- ggplot(dat, aes(x=t, y=rw)) + geom_line() + 
  ggtitle("Random Walk") + xlab("") + ylab("value")
rws <- apply(matrix(rnorm(TT*nsim),TT,nsim),2,cumsum)
rws <- data.frame(rws)
rws$id <- 1:TT

rws2 <- melt(rws, id.var="id")
p2 <- ggplot(rws2, aes(x=id,y=value,group=variable)) +
  geom_line() + xlab("") + ylab("value") +
  ggtitle("The variance of a random walk process grows in time")
grid.arrange(p1, p2, ncol = 1)

Optional Adding an intercept to a random walk model

When we add an intercept to a random walk, it behaves very differently than a stationary process with intercept added.

\[x_t = x_{t-1} + \alpha + e_t\]

looks very different than

\[x_t = \beta_1 x_{t-1} + \alpha + e_t\]

Adding an intercept to a random walk leads to an upward linear drift. While in the AR(1) model, it leads to a flat level of \(\alpha/(1-\beta_1)\).

TT <- 100
intercept <- 1
trend <- beta1 <- 0.5
rwi <- rep(0,TT)
err <- rnorm(TT, sd=sd)
for(i in 2:TT) rwi[i] <- rwi[i-1] + intercept + err[i]
ar1i <- rep(intercept/(1-beta1),TT)
for(i in 2:TT) ar1i[i] <- beta1 * ar1i[i-1] + intercept + err[i]
par(mfrow=c(1,2))
plot(rwi, type="l")
plot(ar1i, type="l")

require(ggplot2)
require(gridExtra)
dat <- data.frame(t=1:TT, y=cumsum(rnorm(TT)))
dat$yi <- cumsum(rnorm(TT,intercept,1))
dat$yti <- cumsum(rnorm(TT,intercept+trend*1:TT,1))
p1 <- ggplot(dat, aes(x=t, y=y)) + geom_line() + ggtitle("Random Walk")
p2 <- ggplot(dat, aes(x=t, y=yi)) + geom_line() + ggtitle("with non-zero mean added")
p3 <- ggplot(dat, aes(x=t, y=yti)) + geom_line() + ggtitle("with linear trend added")

grid.arrange(p1, p2, p3, ncol = 1)

Dickey-Fuller test

The null hypothesis for this test is that the data are a random walk (non-stationary). You want the null hypothesis to be rejected.

Dickey-Fuller test on white noise

Let’s first do the test on data we know is stationary, white noise. We have to make choose the type and lags. If you have no particular reason to not include an intercept and trend, then use type="trend". This allows both intercept and trend. Next you need to chose the lags. The default of lags=1 is generally ok but we will use lags=0 to fit a simpler model given that we don’t have many years of data.

It is fitting this model to the data and you are testing if z.lag.1 is 0.

x(t) - x(t-1) = z.lag.1 * x(t-1) + intercept + tt * t

If you see *** or ** on the coefficients list for z.lag.1, it indicates that z.lag.1 is significantly different than 0 and this supports the assumption of stationarity.

The intercept and tt estimates indicate where there is a non-zero level (intercept) or linear trend (tt).

require(urca)
wn <- rnorm(TT)
test <- ur.df(wn, type="trend", lags=0)
summary(test)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression trend 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.29972 -0.64649 -0.04202  0.65400  1.85465 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.0604636  0.1965329  -0.308    0.759    
## z.lag.1     -0.8823532  0.1010937  -8.728    8e-14 ***
## tt           0.0004498  0.0034092   0.132    0.895    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9685 on 96 degrees of freedom
## Multiple R-squared:  0.4426, Adjusted R-squared:  0.431 
## F-statistic: 38.12 on 2 and 96 DF,  p-value: 6.538e-13
## 
## 
## Value of test-statistic is: -8.7281 25.4117 38.1163 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau3 -4.04 -3.45 -3.15
## phi2  6.50  4.88  4.16
## phi3  8.73  6.49  5.47

The coefficient part of the summary indicates that z.lag.1 is different than 0 (so stationary) an no support for intercept or trend.

Scroll down and notice that the test statistic is LESS than the critical value for tau3 at 5 percent. This means the null hypothesis is rejected at \(\alpha=0.05\), a standard level for significance testing.

Dickey-Fuller on the anchovy data

test <- ur.df(anchovy, type="trend", lags=0)
summary(test)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression trend 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.32439 -0.12677  0.01971  0.13313  0.34676 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  6.09932    1.75245   3.480  0.00212 **
## z.lag.1     -0.72389    0.20947  -3.456  0.00225 **
## tt           0.04171    0.01336   3.123  0.00495 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1896 on 22 degrees of freedom
## Multiple R-squared:  0.3522, Adjusted R-squared:  0.2933 
## F-statistic: 5.981 on 2 and 22 DF,  p-value: 0.008431
## 
## 
## Value of test-statistic is: -3.4558 4.3568 5.9805 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau3 -4.15 -3.50 -3.18
## phi2  7.02  5.13  4.31
## phi3  9.31  6.73  5.61

Scroll down to the critical values. The test-statistic is greater than the critical values. This means that we cannot reject the null hypothesis that the data contain a random walk.